Latent growth curve (LGC) models are in a sense, just a different form of the very commonly used mixed model framework. In some ways they are more flexible, mostly in the standard structural equation modeling framework that allows for indirect, and other complex covariate relationships. In other ways, they are less flexible, e.g. requiring balanced data, estimating nonlinear relationships, data with many time points, dealing with time-varying covariates. With appropriate tools there is little one can’t do with the normal mixed model approach relative to the SEM approach, and one would likely have easier interpretation. As such I’d recommend sticking with the standard mixed model framework unless you really need to.
To best understand a growth curve model, I still think it’s instructive to see it from the mixed model perspective, where things are mostly interpretable from what you know from a standard linear model. We will use our GPA example from before.
As before we assume the following for the student effects.
\[\mathcal{GPA} = (b_{\mathrm{intercept}} + \mathrm{intercept}_{\mathrm{student}}) + (b_{\mathrm{occ}} + \mathrm{occ}_{\mathrm{student}})\cdot \mathrm{occasion} + \epsilon\]
\[ \mathrm{intercept}_{\mathrm{student}} \sim \mathscr{N}(0, \tau)\] \[\mathrm{occ}_{\mathrm{student}} \sim \mathscr{N}(0, \varphi)\]
Thus the student effects are random, and specifically are normally distributed with mean of zero and some estimated standard deviation (\(\tau\), \(\varphi\) respectively). We consider these as unspecified, or latent, effects due to student.
In SEM, the latent variables are assumed normally distributed, usually with zero mean, and some estimated variance, just like the random effects in mixed models. Through this that we can maybe start to get a sense of random effects as latent variables (or vice versa). Indeed, mixed models have ties to many other kinds of models (e.g. spatial, additive), because they too add a ‘random’ component to the model in some fashion.
For those familiar with structural equation modeling (SEM), growth curve models will actually look a bit different compared to with typical SEM, because we have to fix the factor loadings to specific values in order to make it work. This also leads to non-standard output relative to other SEM models, as there is nothing to estimate for the many fixed parameters.
More specifically, we’ll have a latent variable representing the random intercepts, as well as one representing the random slopes. The visualization looks like a factor analysis, with a factor we are calling the intercepts and a factor we’re calling the slopes. Unlike with factor analysis, all loadings for the intercept factor are 1[^likematrix]. The loadings for the effect of time are arbitrary, but should accurately reflect the time spacing, and typically it is good to start at zero, so that the zero has a meaningful interpretation.
As might be guessed from the above visualization, for the LGC our data needs to be in wide format, where each row represents a person and we have separate columns for each time point of the target variable, as opposed to the long format we used in the previous mixed model. We can use the spread function from tidyr to help with that.
load('data/gpa.RData')
gpa_wide = gpa %>%
select(student, sex, highgpa, occasion, gpa) %>%
spread(key = occasion, value = gpa) %>%
rename_at(vars(`0`,`1`,`2`,`3`,`4`,`5`), function(x) glue::glue('semester_{x}'))
head(gpa_wide)
student sex highgpa semester_0 semester_1 semester_2 semester_3 semester_4 semester_5
1 1 female 2.8 2.3 2.1 3.0 3.0 3.0 3.3
2 2 male 2.5 2.2 2.5 2.6 2.6 3.0 2.8
3 3 female 2.5 2.4 2.9 3.0 2.8 3.3 3.4
4 4 male 3.8 2.5 2.7 2.4 2.7 2.9 2.7
5 5 male 3.1 2.8 2.8 2.8 3.0 2.9 3.1
6 6 female 2.9 2.5 2.4 2.4 2.3 2.7 2.8
We’ll use lavaan for our excursion into LGC. The syntax will require its own modeling code, but lavaan tries to keep to R regression model style. The names of intercept and slope are arbitrary. The =~ is just denoting that the left-hand side is the latent variable, and the right-hand side are the observed/manifest variables.
lgc_init_model = '
intercept =~ 1*semester_0 + 1*semester_1 + 1*semester_2 + 1*semester_3 + 1*semester_4 + 1*semester_5
slope =~ 0*semester_0 + 1*semester_1 + 2*semester_2 + 3*semester_3 + 4*semester_4 + 5*semester_5
'
Now we’re ready to run the model. Note that lavaan has a specific function, growth, to use for these models. It doesn’t spare us any effort for the model syntax, but does make it unnecessary to set various arguments for the more generic sem and lavaan functions.
library(lavaan)
lgc_init = growth(lgc_init_model, data = gpa_wide)
summary(lgc_init)
lavaan 0.6-3 ended normally after 73 iterations
Optimization method NLMINB
Number of free parameters 11
Number of observations 200
Estimator ML
Model Fit Test Statistic 43.945
Degrees of freedom 16
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
intercept =~
semester_0 1.000
semester_1 1.000
semester_2 1.000
semester_3 1.000
semester_4 1.000
semester_5 1.000
slope =~
semester_0 0.000
semester_1 1.000
semester_2 2.000
semester_3 3.000
semester_4 4.000
semester_5 5.000
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
slope 0.002 0.002 1.629 0.103
Intercepts:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.000
.semester_1 0.000
.semester_2 0.000
.semester_3 0.000
.semester_4 0.000
.semester_5 0.000
intercept 2.598 0.018 141.956 0.000
slope 0.106 0.005 20.338 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.080 0.010 8.136 0.000
.semester_1 0.071 0.008 8.799 0.000
.semester_2 0.054 0.006 9.039 0.000
.semester_3 0.029 0.003 8.523 0.000
.semester_4 0.015 0.002 5.986 0.000
.semester_5 0.016 0.003 4.617 0.000
intercept 0.035 0.007 4.947 0.000
slope 0.003 0.001 5.645 0.000
Most of the output is blank, which is needless clutter, but we do get the same five parameter values we are interested in though.
Start with the ‘intercepts’:
Intercepts:
Estimate Std.Err Z-value P(>|z|)
intercept 2.598 0.018 141.956 0.000
slope 0.106 0.005 20.338 0.000
It might be odd to call your fixed effects ‘intercepts’, but it makes sense if we are thinking of it as a multilevel model as depicted previously, where we actually broke out the random effects as a separate model. The estimates here are pretty much spot on with our mixed model estimates.
library(lme4)
gpa_mixed = lmer(gpa ~ occasion + (1 + occasion | student), data=gpa)
summary(gpa_mixed)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
Data: gpa
REML criterion at convergence: 261
Scaled residuals:
Min 1Q Median 3Q Max
-3.2695 -0.5377 -0.0128 0.5326 3.1939
Random effects:
Groups Name Variance Std.Dev. Corr
student (Intercept) 0.045193 0.21259
occasion 0.004504 0.06711 -0.10
Residual 0.042388 0.20588
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.599214 0.018357 141.59
occasion 0.106314 0.005885 18.07
Correlation of Fixed Effects:
(Intr)
occasion -0.345
# fixef(gpa_mixed)
Now let’s look at the variance estimates. The estimation of residual variance for each time point in the LGC distinguishes the two approaches, but not necessarily so. We could fix them to be identical here, or conversely allow them to be estimated in the mixed model framework. Just know that’s why the results are not identical (to go along with their respective estimation approaches, which are also different by default).
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
slope 0.002 0.002 1.629 0.103
Variances:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.080 0.010 8.136 0.000
.semester_1 0.071 0.008 8.799 0.000
.semester_2 0.054 0.006 9.039 0.000
.semester_3 0.029 0.003 8.523 0.000
.semester_4 0.015 0.002 5.986 0.000
.semester_5 0.016 0.003 4.617 0.000
intercept 0.035 0.007 4.947 0.000
slope 0.003 0.001 5.645 0.000
VarCorr(gpa_mixed)
Groups Name Std.Dev. Corr
student (Intercept) 0.212587
occasion 0.067111 -0.098
Residual 0.205883
The differences provide some insight. LGC by default assumes heterogeneous variance for each time point. Mixed models by default assume the same variance for each time point, but can allow them to be estimated separately in most modeling packages.
As an example, if we fix the variances to be equal, the models are now identical.
model = "
intercept =~ 1*semester_0 + 1*semester_1 + 1*semester_2 + 1*semester_3 + 1*semester_4 + 1*semester_5
slope =~ 0*semester_0 + 1*semester_1 + 2*semester_2 + 3*semester_3 + 4*semester_4 + 5*semester_5
semester_0 ~~ residual*semester_0
semester_1 ~~ residual*semester_1
semester_2 ~~ residual*semester_2
semester_3 ~~ residual*semester_3
semester_4 ~~ residual*semester_4
semester_5 ~~ residual*semester_5
"
growthCurveModel = growth(model, data=gpa_wide)
summary(growthCurveModel)
lavaan 0.6-3 ended normally after 51 iterations
Optimization method NLMINB
Number of free parameters 11
Number of equality constraints 5
Number of observations 200
Estimator ML
Model Fit Test Statistic 191.409
Degrees of freedom 21
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
intercept =~
semester_0 1.000
semester_1 1.000
semester_2 1.000
semester_3 1.000
semester_4 1.000
semester_5 1.000
slope =~
semester_0 0.000
semester_1 1.000
semester_2 2.000
semester_3 3.000
semester_4 4.000
semester_5 5.000
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
slope -0.001 0.002 -0.834 0.404
Intercepts:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.000
.semester_1 0.000
.semester_2 0.000
.semester_3 0.000
.semester_4 0.000
.semester_5 0.000
intercept 2.599 0.018 141.947 0.000
slope 0.106 0.006 18.111 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.smstr_0 (rsdl) 0.042 0.002 20.000 0.000
.smstr_1 (rsdl) 0.042 0.002 20.000 0.000
.smstr_2 (rsdl) 0.042 0.002 20.000 0.000
.smstr_3 (rsdl) 0.042 0.002 20.000 0.000
.smstr_4 (rsdl) 0.042 0.002 20.000 0.000
.smstr_5 (rsdl) 0.042 0.002 20.000 0.000
intrcpt 0.045 0.007 6.599 0.000
slope 0.004 0.001 6.387 0.000
Compare to the lme4 output.
Groups Name Variance Corr
student (Intercept) 0.0451934
occasion 0.0045039 -0.098
Residual 0.0423879
How can we put these models on the same footing? Let’s take a step back and do a model with only random intercepts. In this case, time is an observed measure, and has no person-specific variability. Our graphical model now looks like the following.
We can do this by fixing the slope ‘factor’ to have zero variance. However, note also that in the LGC, at each time point of the gpa outcome, we have a unique (residual) variance associated with it. Conversely, this is constant in the mixed model setting, i.e. we only have one estimate for the residual variance that does not vary by occasion. We deal with this in the LGC by giving the parameter a name and then applying it to each time point.
lgc_ran_int_model = '
intercept =~ 1*semester_0 + 1*semester_1 + 1*semester_2 + 1*semester_3 + 1*semester_4 + 1*semester_5
slope =~ 0*semester_0 + 1*semester_1 + 2*semester_2 + 3*semester_3 + 4*semester_4 + 5*semester_5
slope ~~ 0*slope # slope variance is zero
intercept ~~ 0*slope # no covariance
semester_0 ~~ resid*semester_0 # same residual variance for each time point
semester_1 ~~ resid*semester_1
semester_2 ~~ resid*semester_2
semester_3 ~~ resid*semester_3
semester_4 ~~ resid*semester_4
semester_5 ~~ resid*semester_5
'
Now each time point will have one variance estimate. Let’s run the LGC.
lgc_ran_int = growth(lgc_ran_int_model, data = gpa_wide)
summary(lgc_ran_int, nd=4) # increase the number of digits shown
lavaan 0.6-3 ended normally after 36 iterations
Optimization method NLMINB
Number of free parameters 9
Number of equality constraints 5
Number of observations 200
Estimator ML
Model Fit Test Statistic 338.824
Degrees of freedom 23
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
intercept =~
semester_0 1.0000
semester_1 1.0000
semester_2 1.0000
semester_3 1.0000
semester_4 1.0000
semester_5 1.0000
slope =~
semester_0 0.0000
semester_1 1.0000
semester_2 2.0000
semester_3 3.0000
semester_4 4.0000
semester_5 5.0000
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
slope 0.0000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.0000
.semester_1 0.0000
.semester_2 0.0000
.semester_3 0.0000
.semester_4 0.0000
.semester_5 0.0000
intercept 2.5992 0.0217 120.0471 0.0000
slope 0.1063 0.0041 26.1094 0.0000
Variances:
Estimate Std.Err z-value P(>|z|)
slope 0.0000
.smstr_0 (resd) 0.0580 0.0026 22.3607 0.0000
.smstr_1 (resd) 0.0580 0.0026 22.3607 0.0000
.smstr_2 (resd) 0.0580 0.0026 22.3607 0.0000
.smstr_3 (resd) 0.0580 0.0026 22.3607 0.0000
.smstr_4 (resd) 0.0580 0.0026 22.3607 0.0000
.smstr_5 (resd) 0.0580 0.0026 22.3607 0.0000
intrcpt 0.0634 0.0073 8.6605 0.0000
Compare it to the corresponding mixed model.
summary(lme4::lmer(gpa ~ occasion + (1|student), data=gpa))
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 | student)
Data: gpa
REML criterion at convergence: 408.9
Scaled residuals:
Min 1Q Median 3Q Max
-3.6169 -0.6373 -0.0004 0.6361 2.8310
Random effects:
Groups Name Variance Std.Dev.
student (Intercept) 0.06372 0.2524
Residual 0.05809 0.2410
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.599214 0.021696 119.8
occasion 0.106314 0.004074 26.1
Correlation of Fixed Effects:
(Intr)
occasion -0.469
Now we have identical results. Now let’s let the slope for occasion vary. We can just delete or comment out the syntax related to the (co-) variance. We will keep the variance constant.
lgc_ran_int_ran_slope_model = '
intercept =~ 1*semester_0 + 1*semester_1 + 1*semester_2 + 1*semester_3 + 1*semester_4 + 1*semester_5
slope =~ 0*semester_0 + 1*semester_1 + 2*semester_2 + 3*semester_3 + 4*semester_4 + 5*semester_5
# slope ~~ 0*slope # slope variance is zero
# intercept ~~ 0*slope # no covariance
semester_0 ~~ resid*semester_0 # same residual variance for each time point
semester_1 ~~ resid*semester_1
semester_2 ~~ resid*semester_2
semester_3 ~~ resid*semester_3
semester_4 ~~ resid*semester_4
semester_5 ~~ resid*semester_5
'
lgc_ran_int_ran_slope = growth(lgc_ran_int_ran_slope_model, data = gpa_wide)
summary(lgc_ran_int_ran_slope, nd=4) # increase the number of digits shown
lavaan 0.6-3 ended normally after 51 iterations
Optimization method NLMINB
Number of free parameters 11
Number of equality constraints 5
Number of observations 200
Estimator ML
Model Fit Test Statistic 191.409
Degrees of freedom 21
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
intercept =~
semester_0 1.0000
semester_1 1.0000
semester_2 1.0000
semester_3 1.0000
semester_4 1.0000
semester_5 1.0000
slope =~
semester_0 0.0000
semester_1 1.0000
semester_2 2.0000
semester_3 3.0000
semester_4 4.0000
semester_5 5.0000
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
slope -0.0014 0.0016 -0.8337 0.4045
Intercepts:
Estimate Std.Err z-value P(>|z|)
.semester_0 0.0000
.semester_1 0.0000
.semester_2 0.0000
.semester_3 0.0000
.semester_4 0.0000
.semester_5 0.0000
intercept 2.5992 0.0183 141.9471 0.0000
slope 0.1063 0.0059 18.1113 0.0000
Variances:
Estimate Std.Err z-value P(>|z|)
.smstr_0 (resd) 0.0424 0.0021 20.0000 0.0000
.smstr_1 (resd) 0.0424 0.0021 20.0000 0.0000
.smstr_2 (resd) 0.0424 0.0021 20.0000 0.0000
.smstr_3 (resd) 0.0424 0.0021 20.0000 0.0000
.smstr_4 (resd) 0.0424 0.0021 20.0000 0.0000
.smstr_5 (resd) 0.0424 0.0021 20.0000 0.0000
intrcpt 0.0449 0.0068 6.5992 0.0000
slope 0.0045 0.0007 6.3874 0.0000
summary(lme4::lmer(gpa ~ occasion + (1 + occasion|student), data=gpa))
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
Data: gpa
REML criterion at convergence: 261
Scaled residuals:
Min 1Q Median 3Q Max
-3.2695 -0.5377 -0.0128 0.5326 3.1939
Random effects:
Groups Name Variance Std.Dev. Corr
student (Intercept) 0.045193 0.21259
occasion 0.004504 0.06711 -0.10
Residual 0.042388 0.20588
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.599214 0.018357 141.59
occasion 0.106314 0.005885 18.07
Correlation of Fixed Effects:
(Intr)
occasion -0.345
Note that the intercept-slope relationship in the LGC is expressed as a covariance. If we want correlation, we just ask for standardized output.
summary(lgc_ran_int_ran_slope, nd=4, std=T)
lavaan 0.6-3 ended normally after 51 iterations
Optimization method NLMINB
Number of free parameters 11
Number of equality constraints 5
Number of observations 200
Estimator ML
Model Fit Test Statistic 191.409
Degrees of freedom 21
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
semester_0 1.0000 0.2118 0.7170
semester_1 1.0000 0.2118 0.7100
semester_2 1.0000 0.2118 0.6709
semester_3 1.0000 0.2118 0.6132
semester_4 1.0000 0.2118 0.5508
semester_5 1.0000 0.2118 0.4920
slope =~
semester_0 0.0000 0.0000 0.0000
semester_1 1.0000 0.0669 0.2241
semester_2 2.0000 0.1337 0.4235
semester_3 3.0000 0.2006 0.5807
semester_4 4.0000 0.2674 0.6955
semester_5 5.0000 0.3343 0.7764
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~~
slope -0.0014 0.0016 -0.8337 0.4045 -0.0963 -0.0963
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.semester_0 0.0000 0.0000 0.0000
.semester_1 0.0000 0.0000 0.0000
.semester_2 0.0000 0.0000 0.0000
.semester_3 0.0000 0.0000 0.0000
.semester_4 0.0000 0.0000 0.0000
.semester_5 0.0000 0.0000 0.0000
intercept 2.5992 0.0183 141.9471 0.0000 12.2724 12.2724
slope 0.1063 0.0059 18.1113 0.0000 1.5903 1.5903
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.smstr_0 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.4859
.smstr_1 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.4763
.smstr_2 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.4253
.smstr_3 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.3554
.smstr_4 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.2867
.smstr_5 (resd) 0.0424 0.0021 20.0000 0.0000 0.0424 0.2287
intrcpt 0.0449 0.0068 6.5992 0.0000 1.0000 1.0000
slope 0.0045 0.0007 6.3874 0.0000 1.0000 1.0000
The std.all is what we typically will look at.